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ABSTRACT 

Context. Non-parametric lensing methods are a useful way of reconstructing the lensing mass of a cluster without making assumptions 
about the way the mass is distributed in the cluster. These methods are particularly powerful in the case of galaxy clusters with a large 
number of constraints. The advantage of not assuming implicitly that the luminous matter follows the dark matter is particularly 
interesting in those cases where the cluster is in a non-relaxed dynamical state. On the other hand, non-parametric methods have 
several limitations that should be taken into account carefully. 

Aims. We explore some of these limitations and focus on their implications for the possible ring of dark matter around the galaxy 
cluster CL0024+17. 

Methods. We project three background galaxies through a mock cluster of known radial profile density and obtain a map for the arcs 
(6 map). We also calculate the shear field associated with the mock cluster across the whole field of view (3.3 arcmin). Combining 
the positions of the arcs and the two-direction shear, we perform an inversion of the lens equation using two separate methods, the 
biconjugate gradient, and the quadratic programming (QADP) to reconstruct the convergence map of the mock cluster 
Results. We explore the space of the solutions of the convergence map and compare the radial density profiles to the density profile 
of the mock cluster. When the inversion matrix algorithms are forced to find the exact solution, we encounter systematic effects 
resembling ring structures, that clearly depart from the original convergence map. 

Conclusions. Overfitting lensing data with a non-parametric method can produce ring-like structures similar to the alleged one in 
CL0024. 

Key words, gravitational lensing: strong ~ gravitational lensing: weak 
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1. introduction 

Gravitational lensing is one of the most powerful probes of dark 
matter In particular, galaxy clusters host the strongest gravita- 
tional potentials in the Universe, hence they are rich in gravi- 
tational lensing effects. The distortions produced in the images 
of background galaxies by a galaxy cluster can be used to re- 
construct the mass distribution of the cluster, which is believed 
to be largely dominated by dark matter. Two regimes are dis- 
tinguished according to the strength of the lensing distortion. 
The weak lensing regime refers to small distortions that usu- 
ally need to be studied in a statistical way. Large distortions, 
on the other hand, can be studied individually (or in pairs) and 
they are referred to as strong lensing. Strong lensing occurs 
when the projected surface mass density is on the order of the 
critical mass density 'Lcm- In this scenario, a gravitational lens 
bends the light in such a way that it can produce multiple im- 
ages (arcs) of the same background galaxy. Each multiple im- 
age can be used as a constraint of the mass distribution. The 
mass distribution has to be such that, when projected back into 
the source plane, the multiple images concentrate (or focus) into 
the same point. In most cases, the number of multiple images is 
small, which results in few consttaints. If only sttong lensing is 
available and the number of constraints is small, one needs to 
rely on parametric methods. However, more and more often new 
data reveals large numbers of multiple images around a single 
cluster. The cluster A 1689 is probably the most spectacular ex- 



ample t o date where hu ndreds of arcs can be seen around the 
cluster (iBroadhurst et a l. 2005a.b). When the number of con- 
straints is sufficiently large, non-parametric methods become 
competitive with the parametric ones and with the advantage 
that no a priori assumption is made about the mass distribution 
of the cluster. Non-parametric methods applied to lensing mass 
r econ s truction have been studied in the past |Saha & Williarn| 
1997; Abdelsalam et al. 1998a; Bridle et al. 1998; Seitz et a!] 
1998; Kneib et al. 2003; Diego et al...2005a,b;, Smith et aL 200^ 
Bradacetal. 2005; Halkola et al.1 120061: ICacciato et al.ll2006l) V 
On the positive side, in cases where the number of constraints 
is large, the results obtained with the parametric and non- 
parametric methods agree well (Diego et al. 2005b) probing, 
among other things, that the dark matter does trace the lumi- 
nous matter and the usefulness of non-parametric methods as a 
way of testing that the assumptions made in the parametric meth- 
ods are well founded. Non-parametric methods have been used 
as well t o combine weak and strong lensing data in the same 
analv sis dXbdelsa am et alJll998bl:lBridle et al.lll 998^ 'Saha et al ' 



1999t Kneib et a 



Diego etal.ll2007h . 



I2003t ISmithet al.1 120051 Uradac et al. 200l 



On the other hand, non-parametric methods have a series 
of limitations. In this paper we explore one of these limitations 
related to the limited resolution in the mass reconstruction 
and its connection with the accuracy in the reconstructed arc 
positions. 
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The results of this paper may have implications for the re- 
sults of ' Jee et aP (12007' ). who use a non-parametric method and 
find an unusual ring of dark matter around the cluster While we 
do not question the validity of these interesting results, we ex- 
plore the possibility that spurious structures might appear when 
using non-parametric methods if the limitations of parametric 
methods are not taken into account in the analysis. 

1.1. A ring of dark matter around CL0024+ 1 7? 

The cluster CL0024+17 (z = 0.395) was one of the first fo r 
which strong lensing was observed dWalhngton et al.l Il992h . 
Four strongly lens ed arcs can be clearly seen around the tang en- 
tial critical curve dSmail et al.lll996l:lBroadhurst et al.ll200o( see 
also). These arcs have been used to constrain the mass in the cen- 
tral region of the cluster dCpl lev et al. 1996; Tvson et al. 199^ 
iBroadhurst et al.ll2000l:IComerford et alJ2006l) . These mass con- 
straints have been compared wit h those derived from X-ray 
measure ments with CHAN DRA dOta et all l2004 and XMM- 
Newton dZhang et al.ll2005() . These authors estimated that the X- 
ray masses are a factor 3-4 lower than the lensing masses. This 
discrepancy has been interpreted as a sign that the cluster is not 
in hyd rostatic e q uilibr ium. 

In IJee et al ] d2007h . the authors reconstruct the mass of the 
cluster out to 100 arcseconds from its center This corresponds 
to a physical size of 0.389 Mpc for an object located at z ^ 0.4. 
In their analysis, they combine strong and weak lensing with 
a non-parametric method. The authors find a dark matter ring 
surrounding the cl uster core, at r » 75 arcseconds from the 
center (Fig. 10 in Ijee et alJ l2007h . The authors suggest that 
this ring might be the result of a hig h speed collision be tween 
two clusters along the line of sight dCzoske et al. 200 ih i n an 
scenario similar to the 'bullet clusteP~ (lciowe et" 1200 6^) but 
with the difference that in that case the collision is perpendicular 
to the Une of sight. 

Whether the existence of the dark matt er ring is real or not 
has been debated by many other authors (Milgrom & Sanders' 
2008LOin et al. 2008; Zu Hone et al. 2009; Zitrin et al. 2009; 
Umetsu et al.ll2010l) . lMilgrom & Sanders! (2008) reconstruct the 



radial profile of the mass assuming a model based on modi- 
fied Newtonian dynamics (or MOND). The authors claim that 
a ringlike structure appears at the MOND transition region (see 
figs. 3 and 4 in their paper). According to the authors , CL0024 
can be considered as a robust probe of MOND. In lOin et al] 
d2008h . the authors study the distribution of galaxies in CL0024, 
which, being collisionless, should exhibit a similar ring-like pat- 
tern. On the basis of 295 counts, the authors find no evidence 
of a ring in the distr ibution of galaxies. In a different paper, 
IZu Hone et al.l d2009h use a hydrodynamical simulation of two 
coUisioning clusters to compute the radial profiles after the col- 
lision. They find no evidence of either a dip or ring in the radial 
profile outside the core radius after the collision. They conclude 
that a ring-like feature could only be explained by an unlikely 
and highly tuned set of initial conditions before the collision. 

To reanalyze the lensing data for CL0024, Zitr in et aP 
d2009l) analyze this cluster using data from the Hubble Space 
Telescope (HST) instrument ACS/NIC3. The dark matter 
distribution profile was reconstructed using a SL parametric 
method based on six free parameters. The results presented in 
Fig. 1 and Fig. 2 of their pa per reveal neither a dip nor ring in 
the profiles. Finallv. lUmetsu et al. (2010) combine a large field 
of view data set from the SUBARU telescope with data from 
HST ACS/NIC3, finding no evidence of the ringlike structure 



after the mass reconstruction (see Fig. 21 of their paper). 

In this paper, we revisit the debate using a non-parametric 
method similar to that used in iJee et alJ d2007[) but applied to 
simulated data (weak and strong lensing). The advantage of 
using simulations is that the underlying dark matter distribution 
and the position and redshifts of the background sources are 
perfectly known. This offers the unique possibility of compar- 
ing the optimal solution with the multiple possible solutions 
obtained by the non-parametric method. We can also explore 
the space of solutions obtained when the minimization is done 
under different assumptions and compare with the original mass 
distribution. 

In Sections |2] and [3] we introduce the fundamentals of the 
gravitational lensing and the non-parametric method used in 
this paper for the mass reconstruction. In SectionlH we describe 
the mock data used in our analysis. In Section|5] we present the 
results obtained by our non-parametric method and compare the 
different solutions with the optimal one. Finally, in Section |6] 
we discuss our results and in Section|7]our conclusions. 



2. Gravitational lensing basics 

In gravitational lensing, it is usual to adopt the thin lens ap- 
proximation because the cosmological distances between the ob- 
server, the lens, and the sources are much greater than the size 
of the lens. Hence, the lens can be treated as a plane. All the 
other elements in the lensing problem are also assumed to be lo- 
cated in planes. When there are multiple background galaxies, 
each one is assumed to be in a different plane with redshift z, (in 
the case of strong lensing) or in the same plane at the average 
redshift z (in the case of weak lensing). All these planes are per- 
pendicular to the line of sight and the deflection is assumed to 
occur instantly when the light crosses the lens plane. 
We define Dis as the angular diameter distance between the 
source plane and the lens plane and D,,! and Dos as the angu- 
lar diameter distances from the observer to the lens and from 
the observer to the sources, respectively. With respect to the 
line of sight, the sources are located at angular positions 
(/ = 1,2, ...,« with n the number of sources), while the lensed 
images are located at positions 0, (/ - 1 , 2, m with m the num- 
ber of images). We define the equation of the lens 



(1) 



We denote by ij/iff) the two-dimensional potential produced by 
all the masses located at 6' 



I), 



(2) 



where S(0') is the surface density of the cluster at the given po- 
sition 9' . The part outside the integral is related to the critical 
density 



^crif — 



(3) 
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The above equation is used in the definition of the convergence 



(4) 
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The deflection angle a and the convergence can be expressed as 
derivatives of the two-dimension potential 



2 ^ 



(5) 



(6) 



The magnification that the lens produces on the source is quanti- 
fied by the determinant of the matrix describing the variation in 
the image position 66 for a small variation in the source position 
6/3 
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H = det 



From Eq. ([TJ, we get 



= 1 - 
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The strong lens regime is most sensitive to the central mass 
of the cluster, where the mass surface density is normally higher 
than the critical surface mass density {k > 1). When the sur- 
face mass density drops significantly below the critical density 
(k « 1), we are in the regime of weak lensing. Weak lensing 
cannot produce multiple images, but useful information about 
the distribution of the mass in the cluster can be extracted from 
the shear of the distortion {y\ and y2)- Differentiating Eq. ([1]), 
we obtain 



H^6ij- 



where 



80i8ej 



l-K-y\ -72 
-72 I- K + yx 



y\(8) = 2^*^11 -'A22), 
yiiff) = 4'n = "All, 



(9) 



(10) 



(11) 



where the double subscripts indicate the second order partial 
derivative. Equations (fTOb and ( fTTI) can be expressed in the com- 
plex notation 



r = ri + 172 



(12) 



to obtain the amplitude and the orientation of the deformation. 
The reduced shear is defined (in complex notation) g = yji l-K). 
The shear measures coherent shape distortions of source galax- 
ies. 

The detection of multiple images and/or the measurement of the 
shear can be used to constrain the mass distribution of the clus- 
ter. In cases where the number of constraints is large, the mass 
of the cluster expressed in Eq. ^ can be reconstructed using a 
non-parametric method. 

2. 1 . Parameter-free lensing reconstruction 

Her e we adopt f ormah sm and notation of iDiego etaP (12005 ah 
and lDiego et"an(l2007h . 

The mass reconstruction described in those papers is based 
on a parameter-free method where the lens plane is divided into a 
finite number of cells A^^ and Eq. ([TJ can be written in algebraic 
form. The deflection angle ff at a position is computed from 



the net contribution of the discretized mass distribution m, at the 
positions 0,- 



D„ V 0-0i 

a{6) = — > mi{0i) 

c2 D,,D„,^ '\0-0i\^ 



(13) 



The number of cells in the gridded mass must be carefully 
choosen. The discretization of the lens plane affects the spatial 
resolution of the mass reconstruction, as we discuss in more de- 
tail later 

All the positions of the pixels hosting a strong lens image 
can be described by the vector 6 of dimension Ng. For each pixel 
in the 6 vector and for a given discretized mass distribution, a 
corresponding y6 pixel can be traced back to the source plane. The 
relation between all these elements can be written in algebraic 
form 



= rM+P, 



(14) 



where (and p) are vectors containing the x and y components 
of the Ng pixels of the arcs (and sources), M is the vector of the 
masses inside the Nc cells, and the matrix T has the dimension of 

2Nfl xN r). The description of this matrix is given in lDiego et al.l 

20053"). 

Eq. ( fT4l ) is a system of 2Ne linear equations whose solu- 
tion can be achieved using the methods described in Diego et 
al. (2005a). The unknowns of the problem are the masses in the 
M vector and the central positions of the background sources. 
Both vectors can be united into a single one X, rendering the 
simpler equation 



' = AX, 



(15) 



where A is a matrix similar to T but with an extra sparse block 
containing 1 and 0. 

Weak lensing data can be modeled in a similar way. The two 
components of the shear are computed through the matrices that 
represent the contribution of each mass cell: 



M. 



(16) 



A detailed description of the matrices T, Ai and A2 is presented 
in Appendix A. 

After including the weak lensing regime, the joint system of 
linear equations can be explicitly written down as 



0x\ 

6y 

n 

72 ) 



A2 



^ 

ly 






P. 



(17) 



where the element ij in the matrix /j is 1 if the 0, pixel comes 
from the Pj source, and is otherwise. The matrix is the null 
matrix. Eq. (fTTT i can be written in the more compact form 



O = TX, 



(18) 



where O is the vector containing the positions of the arcs and 
the shear measurements, F is a non-square matrix, and X is the 
vector of the unknowns. 

Written in this simple form, the lensing problem could, in 
principle, be resolved after the inversion of Eq. (fTST l. X = F 'O. 
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3. Inversion of the lens equation 

The vector X can be found by inverting Eq. ( fTSl l. However, the 
matrix F is often non-invertible. This is actually not a problem 
as we seek an approximate solution with a more physical mean- 
ing than the exact solution. One of the assumptions made in the 
parametric method is that the background galaxies are infinitely 
small. The exact solution of the system of linear equations would 
reproduce an unphysical situation where the background galax- 
ies are point-like. On the other hand, an approximate solution of 
the system has the benefit that the predicted background sources 
are not point-like but extended. In addition, an approximate so- 
lution allows for some error that is needed to compensate for 
the other wrong assumption made in non-parametric methods, 
namely, the assumption that the mass distribution is discretized. 
The predicted size of the background sources can be controlled 
in the solution by setting an error level or residual, R, in the sys- 
tem of linear equations 



(19) 



In the case of WL, the physical meaning of the residual is the 
associated error in the determination of the reduced shear 

As discussed in Diego et al. (2005a), a powerful way to 
find an approximate solution to the system is through the bi- 
conjugate gradient algorithm, which minimizes the square of the 
residual 



R'c^R = (o - rx)'c xa» - rx) 

= (o'c^o - 20'c *rx + x'r'c *rx) , 



(20) 



where C is the covariance matrix of the residual R and among 
other things includes the relative weigh ts of the SL and WL 
data. As discussed in iDiego etaTI (l2007l) . this residual can be 
described (to first order) by a Gaussian distribution with a diag- 
onal covariance matrix. This is however an approximation. The 
elements of the residual are correlated with each other, in partic- 
ular those elements corresponding to the SL part of the data. The 
elements of the WL part of the residual are far more weakly cor- 
related with each other and the diagonal approximation is a far 
more valid for this part. For the time being, we assume that the 
covariance matrix is diagonal and later discuss its implications. 
The diagonal ap proximatio n has been also assumed in previous 
works, including lJee et al] (|2007). The elements of the diagonal 
corresponding to the SL data are set to osl and the elements of 
the diagonal corresponding to the WL data are set to o-wl- We 
adopt (TsL ~ 1 arcsecond (in radians ) and cr^f = 0.3 (or equiv- 
alently 30%). As discussed in Diego et al.l (l2007h . the value of 
osl has a physical meaning. Its value is connected with the an- 
gular size of the sources. 

An alternative to the bi-conjugate gradient is the non- 
negative quadratic programming (QADP). A brief description of 
bi-conjugate and quadratic programming is given in Appendix 
B. 

Both methods have advantages and disadvantages: the bi- 
conjugate gradient is extremely fast, although the final solution 
may contain unphysical negative masses. On the other hand, the 
non-negative quadratic programming algorithm does not pro- 
duce a solution with negative masses, but it is significantly 
slower than the bi-conjugate gradient (its typical computation 
time is a few hours compared with a few minutes to reach simi- 
lar accuracy). In both cases, a threshold R- ^ e is defined to set 
the level at which the minimization stops. 



The method has one drawback when applied to our problem: 
one can not choose e to be arbitrary small. If one chooses e to be 
very small, the algorithm will try to find a solution that focuses 
the arcs into A^s sources with unphysically small sizes. The mass 
distribution that accomplishes this, is usually very biased rela- 
tive to the correct one: it usually has a lot of substructure with 
large mass fluctuations in the lens plane. One must then choose 
e with some carefully selected criteria. Since the algorithm will 
stop when R^ < e, we should choose e to be an estimate of the 
expected error associated with the sources not being point-like 
and the reconstructed mass being discretized. Instead of defining 
€ in terms of R^, the parameter e should be defined in terms of 
the residual of the conjugate gradient algorithm (see Eq. lB.7l 
in Appendix B). This would accelerate the minimization process 
significantly since we would not need to calculate R at each step 
but use the already estimated r^. Both residuals are connected by 
the relation 



r'^R. 



(21) 



Imposing a prior on the size of the sources means that we expect 
the residual of the lens equation, R, to take typical values on the 
order of the expected dispersion (or size) of the sources at the 



measured redshifts. Hence, we can define a /?prior of the form 



(22) 



where the index / runs from 1 to Ng and o-j is the dispersion 
(prior) assumed for the source associated with pixel / and RND 
is a random number normally distributed with zero mean and 
unity variance. We can then estimate e as 



T 

Vk 



prior • 



(23) 



Following Diego et al. (2005a), we construct Rpnoi- assum- 
ing that the source galaxies can be described as Gaussians 
with cr - 30/i"' kpc. In our particular problem (a grid with 
A^c = 32 X 32 cells), this results in a value e 2 x 10"'°. One has 
to be careful not to choose a too small cr. They should be larger 
than the typical size of a galaxy. Only when the number of grid 
points, Nc, is large enough, can the gridded version of the real 
mass distribution focus the arcs into sources that are similar in 
size to real ones. If is not large enough, the gridded version 
of the true mass focuses the arcs into sources that are larger than 
the real sources. This is explained in more detail below. 

The choice of the threshold is a crucial point when perform- 
ing the mass reconstruction. We illustrate in the next few sec- 
tions how this affects both the final mass estimation and the po- 
sitions of the sources. 



4. Simulation of mock lensing data 

We now describe the simulated data consisting of a simple clus- 
ter and lensing (both strong and weak) data set. The use of simu- 
lated data gives us the unique advantage of being able to compare 
the reconstructed mass with the true underlying simulated mass 
and check for biases and systematics. 

For the cluster, we assume a single Navarro-Frenk- White 
(iNavarro et al ] 1 19961 NFW,) profile for the radial density. We 
choose the simplest possible profile in order to avoid the effects 
of the uncertainties caused by the complexity of the mass distri- 
bution. We also assume the same redshift of CL0024 (z = 0.4), 
while the field of view corresponds to the field of view of the 
ACS field (FOV=3.3 arcmins). The resulting mass in the whole 



4 



p.p. Ponente and J.M. Diego: Lensing systematic effects 



field of view is M(FOV) ~ 4.8 x 10 Mq, while when we con- 
sider core radii within 30", we have M(< 30") ~ 1.28 x IO'^^Mq 
(the mass reconstruction in Jee et al. (2007) yields M(r < 30") ~ 
(1.79 + 0.13) X lO'^Mo). 

The strong and weak lensing data are computed using the full 
resolution of our simulated cluster (in the reconstruction pro- 
cess, the lens plane is divided with a grid that effectively reduces 
this resolution). 

For the strong lensing data, we assume the same numbe r of back- 
ground sources (A^^ - 3) identified in lJee et al ] d2007h and that 
their redshifts are zi = 1.675, zi = 1.27, and = 2.84. We 
carefully chose the position of the backgro und sources in tr ying 
to mimic the strong lensing data set used bv lJee et al ] (120071) . al- 
though this is not really relevant to our work. They identify five 
arcs from source 1, two arcs from source 2 and two arcs from 
source 3, making a total of nine. Most of the arcs are tangential, 
particularly those originating from source 1, which indicates that 
this source has to be positioned very close (in projection) to the 
density peak of the lens. In our case, our simulated strong lens- 
ing data set consists of seven arcs, three of which originate from 
source si (two tangential and one radial), two from source S2 
(one tangential and one radial), and two from source 53 (one tan- 
gential and one radial). The map of the lensed images is repre- 
sented in Fig.[T](top panel), with the labels identifying the orig- 
inal sources. 

The shear data is computed assuming that the density of 
available background galaxies is lower toward the center of 
the cluster, where the presence of the cluster itself makes it 
harder to estimate the reduced shear. For all the shear data 
points, we assume a Gaussian noise of 30%. In addition to 
the cluster itself, the magnification bias has to be taken into 
account. Magnification acts on galaxies (enhancing their flux) 
but also expanding the area of the sky behind the cluster 
In feroadhur st et alj ^)05h), the latter effect is estimated and 
sho wed that a net d eficit of background galaxies is expected (see 
also lUmetsu et al.li201 1.) . The resulting shear field is shown in 
Fig.[T](bottom panel). 

4.1. Simulated vs real data 

In lJee etal.1 (|2007|) . the authors consider a FOV of 3.5 x 3.5 ar- 
cminutes that is gridded in a 52 x 52 regular grid, but with the 
four corner points removed. We consider a slightly smaller FOV 
(3.3 x 3.3 arcminutes) and divide the FOV using a 32 x 32 regu- 
lar grid. We chose the side of the grid to be 32 to ensure that the 
number of constraints is comparable to the number of unknowns 
and hence have a more stable system of equations. A larger num- 
ber of grid points will only introduce unnecessary noise in the 
recon structed solution . 

In llee et al ] (120071) . the strong lensing constraints are derived 
from 132 knots identified in the lensed images and the weak lens- 
ing constraints are based on an ensemble of 1297 background 
galaxies with photometric redshifts Zphot > 0.8. In our simulated 
data, we instead consider all the pixels of our lensed images (288 
pixels) for the strong lensing, while for the weak lensing we cre- 
ate a simulated vectorial field in 1301 positions. 
The solution in Jee et al. (2007) is found after a minimization 
process involving the strong and weak lensing data, a regulariza- 
tion term and a model for the lensing potential. The regulariza- 
tion term improves the smoothness of the recovered solution and 
in principle helps to reduce the overfitting problem. The method 
is based on the maximum entropy method (MEM), which has a 
positive prior that forces the improved solution to remain posi- 
tive. Here, we also use a minimization process but instead of a 



1 arcmin 




si 
-S2 



Fig. 1. Top panel: The lensed arcs {9 map) originated from three 
sources in the background (not shown in the figure). The total 
number of pixels forming the arcs is Ne - 288. Bottom panel: 
shear field derived from the lens and used for the weak lensing 
computation. The inner points have been removed to mimic the 
contamination from cluster member galaxies. Total number of 
shear points is A^shear - 1301, needed to set the dimension of the 
lensing matrix. All points have a Gaussian noise of 30%. 



regularization term we stop the minimization process at a point 
that avoids overfittin g the data . An in teresting discussion of this 
point can be found in lJee et aP (l2007l) . They perform a delensing 
of the arcs from one particular source. The resulting recovered 
sources are reported in Fig. 14 in their paper, where the orienta- 
tion, parity, and size of the images are strongly consistent among 
the different recovered sources. Nonetheless, the positions of the 
the delensed images do not overlap. The same authors report: ' 
When we forced the two locations to coincide in our mass recon- 
struction, the smoothness of the resulting mass map was com- 
promised' . This might indicate a tension between the recovered 
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Fig. 2. Simulated observed arcs (black) versus predicted ones 
from the optimal solution (white). The difference between the 
two sets of 9 positions is representative of the error expected 
when recovering the solution. 

solution and the corresponding goodness of fit. Formally the so- 
lution is not an optimal one in the sense that the recovered source 
positions do not coincide but seem to be good enough to ensure 
that the recovered sources resemble the real ones. 

5. The optimal solution 

With the simulated data, a very interesting exercise can be 
done before attempting the mass reconstruction. Since we know 
the true underlying mass and the positions of the background 
sources, we can predict where the arcs should appear when we 
assume the optimal solution possible for X assuming a uniform 
grid with 32 x 32 cells. This solution consists of the mean mass 
in each cell corresponding to the true underlying mass and the 
three real positions. In Fig. |2] we show the true strong lensing 
or 0-map used to reconstruct the mass, compared with the pre- 
dicted one derived from the optimal solution X. The black arcs 
are obtained from the equation 9 - YM + /3, where the matrix F 
is built from the real 9 positions and the 32 x 32 cells, the vector 
jS contains the real positions of the background sources, and the 
vector M contains the mean mass sampled in the 32 x 32 cells. 

The first interesting conclusion we can derive from this ex- 
ercise is that the arcs predicted from the optimal solution differ 
significantly from the true observed arcs. This is unsurprising as 
the optimal solution lacks the resolution of the true underlying 
mass and hence we should expect a different set of strong lensed 
arcs. To reproduce the observed arcs, the solution has to bend 
the light in a different way. This can only be achieved with a 
mass distribution that is different (i.e biased away) from the true 
one. 

This exercise summarizes the entire philosophy behind this 
paper: using a non-parametric method with a uniform cell size, 
it is impossible to predict correctly the strong lensing data with 
an unbiased solution of the true underlying mass. By default, 
the non-parametric method makes the incorrect assumption that 
the mass distribution is discrete and ignores the details of the 
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Fig. 3. Mass reconstruction obtained with the BGA and no over- 
fitting e = 2 X 10"'". Top panel: mass map after smoothing with 
a Gaussian. The mass inside the FOV is M3.3' = 6.1 x IO'^'Mq, 
while the mass inside the core radius of 30" is M30" = 1.39 x 
10''* Mq. Bottom panel: Surface mass density profile (in units of 
Ec;/f) as a function of radius. Darker areas correspond to higher 
masses. 

mass distribution on scales smaller than the cell size. Hence, the 
derived solution has to be biased by the method in order to fit the 
data and compensate for this incorrect assumption. The best we 
can hope for is a solution that resembles the true underlying mass 
distribution but is unable to fit the observed data perfectly. This 
margin of error in the description of the observed data will then 
compensate the original error made by assuming that the mass is 
discretized. However, we note that we seek a solution as close as 
possible to the true solution, which can only be achieved when a 
realistic error, R, is allowed in the minimization of the system of 
linear equations given in Eq. (fT9l l. 

6. Mass reconstruction 

To solve Eq. (fTSl l. the lens plane is divided into a regular grid 
of 32 X 32 cells. This number is smaller than the number of 
constraints provided by the weak and strong lensing data. The 
mass in each cell plus the positions of the background strong 
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Fig.4.PlotsforM3.3- = 4.34x10' and M30" ~ 1.9xlO"*Mo. 
Overfitting case. It shows the solution obtained with the BGA 
when the method is forced to find a nearly exact solution to the 
problem (e - 2x 10 '^). The density profile inside the core radius 
does not follow the profile of the input NFW cluster. Different 
density peaks and dips can be seen around the center of the FOV. 
Darker areas correspond to higher masses. 



lensing galaxies form a vector of unknown variables X that 
has 1030 elements (1024 for the mass cells and 6 for the three 
sources, each one with the x and y coordinates of the position of 
the background galaxy). 



6.1. The bi-conjugate gradient algorithm solution 

The bi-conjugate gradient algorithm (BGA) is a fast and pow- 
erful algorithm for finding the solutions of a system of linear 
equations. As mentioned earlier, rather than finding the exact so- 
lution, we seek an approximated one with an error large enough 
to compensate for the discretized mass and that the background 
galaxies are not point-like. The minimization is stopped at a 
point where « e. The choice of e is based on the physical 
size of the background galaxies and also that the optimal so- 
lution should not reconstruct the data perfectly as discussed in 




■ Bicanjugate Gradient ffs 
Real ea 



Fig. 5. Black color indicates the observed (or true) arcs and in 
white we show the predicted arcs obtained with the solution 
shown in Fig.|4] 

the previous subsection. A value of e can be computed from the 
equation 

f^Z'-k- (24) 

where rk - r^/?sL,prior + r^/?wL,prior contains an estimate of the 
physical size of the background galaxies (/?sL,prior) and the error 
in the weak lensing measurements (/^wL.piim-, see previous sec- 
tions for the definition of e and its relation to r^). 

Once the value of e is estimated, we can solve for the mass 
and position of the background sources. In Fig. |3] we show the 
mass reconstruction obtained with the BGA for a value of e = 1 x 
10"'° (computed in Eq.|24l corresponding to a ctsl ~ 1.2 arcsec 
and cTwL = 0.3 or 30%). The total recovered mass inside the 
FOV is M(< 3.3') = 6.1 X IO'^Mq, while M(r < 30") = 1.39 x 
IO'^Mq. The radial density profile is shown in the bottom panel 
of the figure, where it is compared with the true mass profile. 
Values of e significantly smaller than ~ lO"'" would produce an 
overfitting of the data, introducing systematics in the final mass 
reconstruction. A typical case of overfitting is shown in Fig. |4] 
where the threshold value of e has been lowered several orders 
of magnitude (e = 2 x lO"'*"). This value pushes the solution to 
the limit of the BGA and allows us to predict almost perfectly 
the observed data. However, this solution is clearly biased with 
respect to the true underlying mass as is clear when looking at 
the density profile (bottom panel). 

The mass map shown in Fig.|4]is obviously a poor solution in 
the sense that it deviates significantly from the underlying mass 
distribution. However, from the point of view of the system of 
linear equations it is a good solution because it is able to re- 
produce the data accurately. This is shown in Fig. |5] where the 
observed arcs are compared to the predicted ones by the over- 
fitting solution. This result should be compared with the case in 
Fig.|2]showing the opposite situation where the closest represen- 
tation of the mass distribution leads to an error in the predicted 
strongly lensed arcs. The conclusion we can extract from this 
example is that a simultaneous (unbiased) reconstruction of the 
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Fig. 6. Plots for M^y = 5.92 x IO'^^Mq and M30" = 1.22 x 
IQ^^Mq. Mass reconstruction obtained with the QADP after 100 
iterations. This case corresponds to a reasonable value of e and 
can be compared with the BGA solution shown in Fig. [3] The 
QADP recovers a higher mass in the central region (E/Ecn/) than 
the BGA. 

mass and the lensing data is impossible with a non-parametric 
method that lacks the details of the mass distribution. 

6.2. The quadratic programing algoritiim solution 

The solution X derived from the BGA might predict negative 
masses, which could lead to large fluctuations in the mass den- 
sity profile as the negative fluctuations have to be compen- 
sated for by larger positive fluctuations. However. lHoekstra et al.l 
(1201 k and references within) report that cosmic noise (an in- 
duced shear effect by uncorrected halos and large-scale struc- 
ture) has to be taken into account when estimating the error bars 
in any cluster mass reconstruction that might lead to a negative 
convergence in the regime of the weak lensing. So a negative 
convergence is not completely unrealistic. 

To avoid the large fluctuations at small radii exhibited by the 
biconjugate gradient, which can indicate a non-physical solu- 
tion, we use the quadratic programming algorithm (QADP, see 
Appendix B), which prevents negative masses from appearing 
in the solution. This method resembles the maximum entropy 



method introduced in iJee et alJ (l2007h . since both impose a pos- 
itive prior on the mass. 

The QADP has a smooth behavior in the inner regions, 
where no large fluctuations are found, even in the crucial areas 
of the lens plane where the transition between the WL and SL 
regimes is observed. In addition, QADP provides an indepen- 
dent solution that should agree with the one derived by the BGA. 

The number of iterations of the algorithm can be directly 
related to e. The overfitting solution obtained by the QADP 
algorithm converges only after a large number of iterations 
(~ 10"* - 10^) or equivalently after defining a small value for e. 

In Fig. |6l we show the solution obtained with QADP af- 
ter 100 iterations. This result can be compared with the one in 
Fig. [3] The QADP recovers a higher mass than the BGA in the 
central region. 

In Fig. 121 we show the overfitting case obtained with QADP 
with a large number of iterations (Mter = 10^, or similarly, with 
a very small value for e). In this case, the mass is pushed away 
from the center towards larger radii in a similar way to what was 
observed using the BGA. This is more clearly evident in the 
density profile. A peak in the density is observed at r = 20" and 
an additional bump at r = 50". The way in which the WL and 
SL are weighted is different in both methods. The overfitting 
solution differs significantly from the true mass (and also from 
Jee's reconstruction in the central part). The overfitting solution 
is dominated in our case by the WL part of the data (as in 
iJee et al. I l200l . As shown in Fig. [8] the WL alone case shows a 
mass deficit at the center that is compensated for by t he ring in 
the ou ter regions. Whether a similar situation occurs in lJee et aU 
(l200l is unclear but we note that Jee's mass reconstruction 
predicts a lower mass at th e center than that o f IZitrin et al] 
(2009) (as seen in figure 21 of lUmetsu et"ani2010l) . 

In llee et al. I (l2007l) . their Fig. 10 shows the radial mass den- 
sity profile of the cluster, with a given at a fiducial redshift 
of Zf - 3. The authors state that the resulting profile does not 
match any conventional analytic profile. The density, peaking at 
the center with the value of S/Ec = 1.3, rapidly decreases from 
the center to the end of the core radius at r = 50". The profile 
then remains almost constant around a value of S/Sc - 0.7. Only 
at radius r = 70" from the center is an increment observable, ex- 
tending out to r = 80" with a peak at r = 75". This is what 
the authors refer to as the bump. In two dimensions, this bump 
appears like a ring structure, separated fr om the core by 20". 

A plateau wa s detected b y Jee et (l2007h at r ~ 50", that 
was not found by Umetsu et^ (i2010i) . who instead measured a 
monotonically decreasing density. This plateau might depend on 
the initial guess. The WL part of the data displays this plateau 
more than the SL data, especially in those regions where WL 
constraints are weaker. The role that the prior plays in determin- 
ing the regularization term in the MEM has to be investigated in 
more detail and leaves questions open on how the choice of the 
prior could affect the radii outside the central core. 

7. Discussion and conclusions 

The interesting analysis of I Jee et aP (l2007h appears to detect a 
dark matter ring around the core of CL0024. This ring might 
have been caused by a recent high speed collision between two 
massive clusters along the line of sight. If confirmed, CL0024 
would be an interesting laboratory to test different physical phe- 
nomena. We have explored the possibility that spurious ring-like 
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Fig. 7. M-iy = 6.81 X lO'^^Mo and M^r = 1.59 x lO'l Mass re- 
construction obtained with QADP and after 10^ iterations (over- 
fitting case). The density peaks at r ~ 15" and a bump is ob- 
served at r ~ 50". Darker areas correspond to higher masses. 



structures might appear as a consequence of overfitting lensing 
data in a non-parametric way. We show how the optimal (un- 
biased) solution should produce a fit to the data significantly 
poorer than the minimal solution. This error is necessary to 
account for the initial error introduced when neglecting the im- 
pact of the small-scale fluctuations on the mass distribution. We 
demonstrate our argument by using a simulated data set where 
all the variables are known a priori and the reconstructed mass 
can be compared with the original one. The simulation shows 
how overfitting the data introduces artifacts in the reconstructed 
solution, which can resemble th e ring-like struct ure found in 
iJee et al.1 (l2007h . The methods in iJee et al.1 (l2007l) and the one 
used in this work are diff'erent in some aspects but both methods 
share many common key features such as the lens plane is di- 
vided into a regular grid and the parameters to be constrained are 
basically those for convergence in the pixels. Hence both meth- 
ods should also have the same systematic efifects and in particular 
be sensitive in a similar way to overfitting. 

Another interesting feature shown by the simulations that 
needs to be investigated more (with the actual data) is that when 
the density of weak lensing data is non-uniform across the field 



Fig. 8. Reconstructed image for the case where only weak lens- 
ing data is used in the reconstruction. A clear ring of matter ap- 
pears in the area where the density of weak lensing data gets 
reduced. Whiter colors indicate more mass. 



of view, there is a tendency for the overfitted solution to increase 
the mass density in the areas with fewer weak lensing data. We 
show one example in Fig. [8] where only the simulated weak lens- 
ing data is used to find the solution. The plot shows the WL 
data overlaid on the overfitted solution found for this case. In the 
case of CL0024, we expect a lower density of WL points toward 
the center of the cluster owing to contamination by the cluster 
members. While the SL data constraints the inner central region 
of the cluster, the outer regions are basically constrained by the 
WL data alone. In-between these two regions, the density of WL 
data points should show a gradient, and the effect of the non- 
uniformity of the WL data points might have a negative effect 
on the solution. The reality of the ringlike structure will need to 
be investigated in more detail. 

We note that the covariance matrix of the residu al might not 
necessarily be diagonal. As discussed in section 7 of Dieg o et al.l 
(2007), the elements of the residual are correlated with each 
other, in particular the strong lensing part of the residual. The 
elements of the WL portion of the residual are more weakly cor- 
related with each other, and the diagonal approximation is in this 
case more valid. This is particularly true in our case where the 
error assigned to the WL measurements is the predominant one 
(30%). Since the WL data are more relevant to understanding the 
ring-like structure, we adopt the diagonal approximation for the 
covariance matrix. In addition, the second reaso n why we prefer 
to adopt this approximation in this paper is that iJee et al.l (l2007h 
assumed that the data are uncorrected (the covariance matrix is 
diagonal for an uncorrected residual). The issue of the effect of 
the covariance matrix in lensing reconstruction has not been ad- 
dressed by any method (to the best of our knowledge) and we 
plan to do so in a future pa per. Another int eresting point that de- 
serves discussion is that in lJee et al ] (l2007h a regularization term 
is included in the analysis, among other things, to prevent over- 
fitting. This regularization term, however, does not guarantee 
that overfitting is prevented. The main objective of the regular- 
ization term is to favor solutions that are smooth by introducing 
a prior that represents a smoothed version of the solution. If we 
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consider the extreme case where the reconstructed solution con- 
verges to the prior in their regularization term (this is not an unre- 
alistic scenario because the prior is updated at each iteration and 
based on the previous solution), the regularization term tends to 
zero forcing the other terms in to be even smaller and hence 
closer to an overfitting situation. The SL and WL terms to be 
minimized are the ones that really constrain the model and can 
still be too small even for smooth solutions. Our work shows that 
a good solution obtained with our non-parametric method should 
predict arcs significantly different from the ones observed. Only 
when overfitting is allowed can the reconstructed data closely 
reproduce the observations (see Figs.|2]and|5]above). 

Our work shows the validity and usefulness of non- 
parametric methods but also shows some of its limitations, in 
particular that one should not be too ambitious when fitting the 
data. 
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Appendix A: How is built the F matrix 

The r matrix is the basis of the Weak and Strong Lensing 
Analysis Package (WSLAP) and contains the information about 
how each cell in the grid contributes to either the deflection 
angle or the A:* shear measurement. In the SL case, it also con- 
tains information about the source identity of the pixel in a 
given lensed arc. All this information is organized in rows, each 
row corresponding to one constraint (deflection angle for SL and 
shear for WL). The final structure of F is 



Tx 1 

Yy 1 

Al 

A2 



(A.l) 



The specific form of the T and F matrices depends on the choice 
of basis system. For clarity purposes, we assume that this sys- 
tem is based on Gaussians positioned on the grid. This grid is a 
division of the lens plane into cells, where the mass in a cell is 
assumed to be distributed as a Gaussian of dispersion cr, which 
is proportional to the size of the cell. A proportionality factor 
~ 2 gives very satisfactory results in terms of reproducing the 
constraints. The integrated mass at a given distance 5 from the 
center of the cell is then 



M(S) = 1 - exp(572o-^). 



(A.2) 



Since the basis has circular symmetry, the x and y components of 
the deflection angle a at the same point can be estimated easily 

as 



a AS) = T, = ^[1 - exp(-52/^^2)]^^ 



a^S) = Ty =/l[l -exp(- 



(A.3) 



(A.4) 



where the multiplying constant X contains all the cosmological 
and redshift dependence 



4G 



h-^ rad . 



(A.5) 



The factor (5x in Eq. ( IA.3b is just the difference (in radians) be- 
tween the X position in the arc (x of pixel 0x) and the x position 
of the cell j in the grid (5x = QJS) - ^i(i))- Similarly, we can 

define 5^ = d^i) - 6'^(j) and S = + 5]. 

The Al and A2 matrices can be computed in a similar way 
but in this case, since we need to calculate the derivatives, the 
deflection angles a,^ and o-y have to be computed at three points 
^1, 62, and ^3. The first point, 6\, is the same as 6 above. The 
second and third points (^2 and ^3) are one (or a few) pixel(s) 
left (or right) and up (or down) the pixel at ^i, respectively. Then 
A] is just the difference 



A, - - 



1 [a,{5i) - a,{5x)] - [^(^3) - %(<Ji)] 



pix2rad 



(A.6) 
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A, = 



pix2rad 



pix2rad 



(A.7) 



where pix2rad is the size of the pixel in radians. Since we in- 
cluded the factor lO'^M© in A (see Eq. IA.5I I. the mass in the 
solution vector will be given in lO'^/z 'M© units. The /i ' depen- 
dency exists because in A we have the ratio Dis/(DoiZ)os), which 
goes as h. 

The (null) and 1 (O's and 1 's) matrices on the right side of 
F add 2A^s additional columns. The bottom part of thess columns 
consist entirely of O's since the shear measurements are indepen- 
dent of the position /? of the sources. The Ng x A^s dimensional 
matrices 1 contain I's in the /y positions (/ € [l,Ng],j e [l,A^s]), 
where the 6 pixel comes from the j source and O's elsewhere. 

Appendix B: Minimizing algorithms 

B.1. Biconjugate gradient algorithm or BGA 

The biconjugate gradient (iPress et al.lll997h algorithm is one of 
the fastest and most powerful algorithms for solving systems of 
linear equations. It is also extremely useful for finding approx- 
imate solutions for systems where no exact solutions exists or 
where the exact solution is not the one we are interested in. The 
latter is our case. Given a system of linear equations 



Ax - b. 



(B.l) 



a solution of this system can be found by minimizing the func- 
tion 



fix) — c — bx -\ — X Ax, 
2 



(B.2) 



where c is a constant. The gradient of the Eq. ( IB.2b is when the 
same equation is at its minima 



V/(jc) ^Ax-b^O. 



(B.3) 



That is, at the position of the minimum of the function f(x) we 
find a solution to Eq. (IB. 11 1. In most cases, finding the minimum 
of Eq. (IB. 2b is much easier than finding the solution of the sys- 
tem in lB.ll especially when no exact solution exists for lB.ll or A 
does not have an inverse. 

The biconjugate gradient finds the minimum of Eq. ( IB.2b (or 
equivalently, the solution of Eq. IB. lb by following an iterative 
process that minimizes the function fix) in a series of steps no 
longer than the dimension of the problem. The beauty of the 
algorithm is that the successive minimizations are carried out on 
a series of orthogonal conjugate directions, p^, with respect to 
the metric A. That is. 



PiApi = 



J < I- 



This condition is useful when minimizing in a multidimensional 
space because it guarantees that successive minimizations do not 
spoil the minimizations in previous steps. 

By comparising Eq. (|20] | and Eq. ( IB. 2b . it is easy to identify 
the terms, c - il/2)e^d, b - T"^ and A - T'^F. Minimizing the 
quantity/?^ is equivalent to solving Eq. (fT9] l. To see this, we only 
have to realize that 



b-AX^ r^(0 - FX) = r^R. 



(B.5) 



If an exact solution for Eq. (fT9T l does not exist, the minimum of 
R- will be a more accuratly approximated solution to the system. 
The minimum can be found easily: in the case of symmetric ma- 
trices A, the algorithm constructs two sequences of vectors 



and and two constants, ay_ and yS^. To begin the algorithm, we 
need to make a first guess of the solution, namely Xq and two 
vectors ro and po 



T 



'"k+1 



rk - ffkApk, 



y6k 



,'"k+i 



T 



Pk+i = '"k+1 +y8pk 



(B.6) 
(B.7) 
(B.8) 
(B.9) 



At every iteration, an improved estimate of the solution is given 
by 



Xk+i = Xk -H (TkySk- 



(B.IO) 



The algorithm starts with an initial guess for the solution, X\, 
and chooses the residual and the new search direction in the first 
iteration to be 



n ^b-AXi. 



(B.ll) 



We note that pi is nothing but VR^. Thus, the algorithm chooses 
as a first minimization direction the gradient of the function to 
be minimized at the position of the first guess. It then minimizes 
in directions that are conjugate to the previous ones until either 
it reaches a minimum or the square of the residual R^ is smaller 
than e. 



B.2. Quadratic programming aigoritlnm (QADP) 

The nonnegative quadratic programming algorithm used in this 
work has the peculiarity that it finds solutions, X, satisfying the 
condition X > Q. That is, negative masses are not allowed in the 
solution by constr uction. We follow the multiplicative updates 
proposed bv Sha et~al](l2002h . 

The basic problem we wish to solve is to minimize the quadratic 
function 



Fiv) = Iv'Av + bV, 



(B.12) 



subject to the constraint v,- > 0, V/. In Eq. (IB. 12b . the vector v 
is the unknown vector X, A = F^F and b = F^O. We note that 
the elements of X are all positive, since the /3s can be chosen all 
positive with respect to an appropriate system of reference. The 
matrix A can be decomposed into its positive and negative parts: 

if Aij > and otherwise and 



A = A+ - A 



(B.4) A:. 



where A^ - Aij 



-Aij if Aij < and otherwise inonnegative matrices). 



The solution is iteratively updated by the rule 



t-'k+l,i 



where the updating term is defined as 



6, = 



-h+ ^/72+4(A+v),(A-v), 
2(A+v). ■ 



(B.13) 



(B.14) 



It is easy to see that generic quadratic programming prob- 
lems have a single unique minimum. We denote as v* this global 
minimum of Fiv). We attempt to prove that convergence of the 
iteration Eq. ( IB. 14b corresponds to this minimum v*. At this 
point, one of two conditions must apply for each component 
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V*: either (i) v* > and dF/dviiv*) = or (ii) v* = and 
dFldviiv*) > 0. Now since 

dF 

— =(A^v)i-(A-v)i + fci, (B.15) 

V* 

the multiplicative updates in both cases (i) and (ii) take the value 
6i - I, where the minimum is a fixed point. Conversely, a fixed 
point of the iteration must be the minimum v*. 
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